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Understanding  how  host  load  changes  over  time  is  instrumental  in 
predicting  the  execution  time  of  tasks  or  jobs,  such  as  in  dynamic  load 
balancing  and  distributed  soft  real-time  systems.  To  improve  this 
understanding,  we  collected  week-long,  1  Hz  resolution  traces  of  the 
Digital  Unix  5  second  exponential  load  average  on  over  35  different 
machines  including  production  and  research  cluster  machines,  compute 
servers,  and  desktop  workstations.  Separate  sets  of  traces  were 
collected  at  two  different  tirries  of  the  year.  The  traces  capture  all  of  the 
dynamic  load  information  available  to  user-level  programs  on  these 
machines.  We  present  a  detailed  statistical  analysis  of  these  traces  here, 
including  summary  statistics,  distributions,  and  time  series  analysis 
results.  Two  significant  new  results  are  that  load  is  self-similar  and  that  it 
displays  epochal  behavior.  All  of  the  traces  exhibit  a  high  degree  of 
self-similarity  with  Hurst  parameters  ranging  from  0.73  to  0.99,  strongly 
biased  toward  the  top  of  that  range.  The  traces  also  display  epochal 
behavior  in  that  the  local  frequency  content  of  the  load  signal  remains 
quite  stable  for  long  periods  of  time  (150-450  seconds  mean)  and 
changes  abruptly  at  epoch  boundaries.  Despite  these  complex 
behaviors,  we  have  found  that  relatively  simple  linear  models  are 
sufficient  for  short-range  host  load  prediction. 
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Abstract 


Understanding  how  host  load  changes  over  time  is  instrumental  in  predicting  the  execution  time  of 
tasks  or  jobs,  such  as  in  dynamic  load  balancing  and  distributed  soft  real-time  systems.  To  improve 
this  understanding,  we  collected  week-long,  1  Hz  resolution  traces  of  the  Digital  Unix  5  second 
exponential  load  average  on  over  35  different  machines  including  production  and  research  cluster 
machines,  compute  servers,  and  desktop  workstations.  Separate  sets  of  traces  were  collected  at  two 
different  times  of  the  year.  The  traces  capture  all  of  the  dynamic  load  information  available  to  user- 
level  programs  on  these  machines.  We  present  a  detailed  statistical  analysis  of  these  traces  here, 
including  summary  statistics,  distributions,  and  time  series  analysis  results.  Two  significant  new 
results  are  that  load  is  self-similar  and  that  it  displays  epochal  behavior.  All  of  the  traces  exhibit 
a  high  degree  of  self-similarity  with  Hurst  parameters  ranging  from  0.73  to  0.99,  strongly  biased 
toward  the  top  of  that  range.  The  traces  also  display  epochal  behavior  in  that  the  local  frequency 
content  of  the  load  signal  remains  quite  stable  for  long  periods  of  time  (150-450  seconds  mean) 
and  changes  abruptly  at  epoch  boundaries.  Despite  these  complex  behaviors,  we  have  found  that 
relatively  simple  linear  models  are  sufficient  for  short-range  host  load  prediction. 


1  Introduction 


The  distributed  computing  environments  to  which  most  users  have  access  consist  of  a  collection 
of  loosely  interconnected  hosts  running  vendor  operating  systems.  Tasks  are  initiated  indepen¬ 
dently  by  users  and  are  scheduled  locally  by  a  vendor  supplied  operating  system;  there  is  no  global 
scheduler  that  controls  access  to  the  hosts.  As  users  run  their  jobs  the  computational  load  on  the 
individual  hosts  changes  over  time. 

Deciding  how  to  map  computations  to  hosts  in  systems  with  such  dynamically  changing  loads 
(what  we  will  call  the  mapping  problem)  is  a  basic  problem  that  arises  in  a  number  of  important 
contexts,  such  as  dynamically  load-balancing  the  tasks  in  a  parallel  program  [24,  1,  26],  and 
scheduling  tasks  to  meet  deadlines  in  a  distributed  soft  real-time  system  [15, 22, 23, 17]. 

Host  load  has  a  significant  effect  on  running  time.  Indeed,  the  running  time  of  a  compute  bound 
task  is  directly  related  to  the  average  load  it  encounters  during  execution.  Determining  a  good  map¬ 
ping  of  a  task  requires  a  prediction,  either  implicit  or  explicit,  of  the  load  on  the  prospective  remote 
hosts  to  which  the  task  could  be  mapped.  Making  such  predictions  demands  an  understanding  of 
the  qualitative  and  quantitative  properties  of  load  on  real  systems.  If  the  tasks  to  be  mapped  are 
short,  this  understanding  of  load  should  extend  to  correspondingly  fine  resolutions.  Unfortunately, 
to  date  there  has  been  little  work  on  characterizing  the  properties  of  load  at  fine  resolutions.  The 
available  studies  concentrate  on  understanding  functions  of  load,  such  as  availability  [21]  or  job 
durations  [8,  18,  11].  Furthermore,  they  deal  with  the  coarse  grain  behavior  of  load  —  how  it 
changes  over  minutes,  hours  and  days. 

This  paper  is  a  first  step  to  a  better  understanding  the  properties  of  load  on  real  systems  at 
fine  resolutions.  We  collected  week-long,  1  Hz  resolution  traces  of  the  Digital  Unix  load  average 
(specifically,  an  exponential  average  with  a  five  second  time  constant)  on  over  35  different  ma¬ 
chines  that  we  classify  as  production  and  research  cluster  machines,  compute  servers,  or  desktop 
workstations.  We  collected  two  sets  of  such  traces  at  different  times  of  the  year.  The  1  Hz  sample 
rate  is  sufficient  to  capture  all  of  the  dynamic  load  information  that  is  available  to  user-level  pro¬ 
grams  running  on  these  machines.  In  this  paper,  we  present  a  detailed  statistical  analysis  of  both 
sets  of  traces  and  contemplate  the  implications  of  the  properties  we  find  for  the  mapping  problem. 
An  earlier  version  of  this  paper  [6],  concentrated  on  the  first  set  of  traces. 

The  basic  question  is  whether  load  traces  that  might  seem  at  first  glance  to  be  random  and 
unpredictable  might  have  structure  that  could  be  exploited  by  a  mapping  algorithm.  Our  results 
suggest  that  load  traces  do  indeed  have  some  structure  in  the  form  of  clearly  identifiable  proper¬ 
ties.  In  essence,  our  results  characterize  how  load  varies,  which  should  be  of  interest  not  only  to 
developers  of  mapping  and  prediction  algorithms,  but  also  to  those  who  need  to  generate  realistic 
synthetic  loads  in  simulators  or  to  those  doing  analytic  work.  Here  is  a  summary  of  our  results  and 
their  implications; 

(1)  The  traces  exhibit  low  means  but  very  high  standard  deviations  and  maximums.  Relatively 
few  of  the  traces  had  mean  loads  of  1.0  or  more.  The  standard  deviation  is  typically  at  least  as 
large  as  the  mean,  while  the  maximums  can  be  as  much  as  two  orders  of  magnitude  larger.  The 
implication  is  that  these  machines  have  plenty  of  cycles  to  spare  to  execute  jobs,  but  the  execution 
time  of  these  jobs  will  vary  drastically. 

(2)  Standard  deviation  and  maximum,  which  are  absolute  measures  of  variation,  are  positively 
correlated  with  the  mean,  so  a  machine  with  a  high  mean  load  will  also  tend  to  have  a  large  standard 
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deviation  and  maximum.  However,  these  measures  do  not  grow  as  quickly  as  the  mean,  so  their 
corresponding  relative  measures  actually  shrink  as  the  mean  increases.  The  implication  is  that  if 
the  mapping  problem  assumes  a  relative  metric,  it  may  not  be  unreasonable  to  use  the  host  with 
higher  mean  load. 

(3)  The  traces  have  complex,  rough,  and  often  multi-modal  distributions  that  are  not  well  fitted 
by  analytic  distributions  such  as  the  normal  or  exponential  distributions.  Even  for  the  traces  which 
exhibit  unimodally  distributed  load,  the  normal  distribution’s  tail  is  too  short  while  the  exponen¬ 
tial  distribution’s  tail  is  too  long.  The  implication  is  that  modeling  and  simulation  that  assumes 
convenient  analytical  load  distributions  may  be  flawed. 

(4)  Time  series  analysis  of  the  traces  shows  that  load  is  strongly  correlated  over  time.  The  au¬ 
tocorrelation  function  typically  decays  very  slowly  while  the  periodogram  shows  a  broad,  almost 
noise-like  combination  of  all  frequency  components.  An  important  implication  is  that  history- 
based  load  prediction  schemes  seem  very  feasible.  However,  the  complex  frequency  domain  be¬ 
havior  suggests  that  linear  modeling  schemes  may  have  difficulty.  From  a  modeling  point  of  view, 
it  is  clearly  important  that  these  dependencies  between  successive  load  measurements  are  captured. 

(5)  The  traces  are  self-similar.  Their  Hurst  parameters  range  from  0.73  to  0.99,  with  a  strong 
bias  toward  the  top  of  that  range.  This  tells  us  that  load  varies  in  complex  ways  on  aU  timp.  scales 
and  is  long  term  dependent.  This  has  several  important  implications.  First,  smoothing  load  by 
averaging  over  an  interval  results  in  much  smaller  decreases  in  variance  than  if  load  were  not  long 
range  dependent.  Variance  decays  with  increasing  interval  length  m  and  Hurst  parameter  H  as 

This  is  °  for  signals  without  long  range  dependence  and  to  for  the  range 

of  H  we  measured.  This  suggests  that  task  migration  in  the  face  of  adverse  load  conditions  may 
be  preferable  to  waiting  for  the  adversity  to  be  ameliorated  over  the  long  term.  The  self-similarity 
result  also  suggests  certain  modeling  approaches,  such  as  fractional  ARIMA  models  [12,  10,  3] 
which  can  capture  this  property. 

(6)  The  traces  display  epochal  behavior.  The  local  frequency  content  of  the  load  signal  re¬ 
mains  quite  stable  for  long  periods  of  time  (150-450  seconds  mean)  and  changes  abruptly  at  the 
boundaries  of  such  epochs.  This  suggests  that  the  problem  of  predicting  load  may  be  able  to  be 
decomposed  into  a  sequence  of  smaller  subproblems. 

After  completing  this  study,  we  evaluated  linear  models  for  predicting  host  load  using  the 
traces,  finding  that  relatively  simple  autoregressive  models  are  sufficient  for  short  range  host  load 
prediction  [7]. 


2  Measurement  methodology 

The  load  on  a  Unix  system  at  any  given  instant  is  the  number  of  processes  that  are  running  or 
are  ready  to  run,  which  is  the  length  of  the  ready  queue  maintained  by  the  scheduler.  The  kernel 
samples  the  length  of  the  the  ready  queue  at  some  rate  and  exponentially  averages  some  number 
of  previous  samples  to  produce  a  load  average  which  can  be  accessed  from  a  user  program.  The 
specific  Unix  system  we  used  was  Digital  Unix  (DUX). 

Unlike  many  Unix  implementations,  which  exponentially  average  with  a  timp.  constant  of  one 
minute  at  the  finest,  DUX  uses  a  time  constant  of  five  seconds.  This  small  timp  constant  allows 
us  to  capture  considerably  more  of  the  dynamics  of  load  than  would  have  been  possible  on  other 
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Unix  implementations,  and  it  minimizes  the  effect  of  phantom  correlations  due  to  the  exponential 
filter.  Interestingly,  directly  sampling  the  length  of  the  ready  queue,  which  we  tried  on  Windows 
NT,  does  not  provide  much  useful  information  because  it  is  impossible  to  sample  the  queue  fast 
enough  from  a  user  process. 

We  developed  a  small  tool  to  sample  the  DUX  load  average  at  one  second  intervals  and  log  the 
resulting  time  series  to  a  data  file.  The  1  Hz  sample  rate  was  arrived  at  by  subjecting  DUX  systems 
to  varying  loads  and  sampling  at  progressively  higher  rates  to  determine  the  rate  at  which  DUX 
actually  updated  the  value.  DUX  updates  the  value  at  a  rate  of  1  /2  Hz,  thus  we  chose  a  1  Hz  sample 
rate  by  the  Nyquist  criterion.  This  choice  of  sample  rate  means  we  capture  all  of  the  dynamic  load 
information  the  operating  system  makes  available  to  user  programs.  We  ran  this  trace  collection 
tool  on  39  hosts  belonging  to  the  Computing,  Media,  and  Communication  Laboratory  (CMCL)  at 
CMU  and  the  Pittsburgh  Supercomputing  Center  (PSC)  for  slightly  more  than  one  week  in  late 
August,  1997.  A  second  set  of  week-long  traces  was  acquired  on  almost  exactly  the  same  set  of 
machines  (35  machines  total)  in  late  February  and  early  March,  1998.  The  results  of  the  statistical 
analysis  were  similar  for  the  two  sets  of  traces. 

All  of  the  hosts  in  the  August,  1997  set  were  DEC  Alpha  DUX  machines,  running  either  DUX 
3.2  or  4.0  and  they  form  four  classes: 

•  Production  Cluster.  13  hosts  of  the  PSC’s  “Supercluster”,  including  two  front-end  machines 
(axpfea,  axpfeb),  four  interactive  machines  (axpO  through  axp3),  and  seven  batch  machines 
scheduled  by  a  DQS  [16]  variant  (axp4  through  axplO.) 

•  Research  Cluster,  eight  machines  in  an  experimental  cluster  in  the  CMCL 
(manchester-l  through  manchester-8.) 

•  Compute  servers-,  two  high  performance  large  memory  machines  used  by  the  CMCL  group 
as  compute  servers  for  simulations  and  the  like  (mojave  and  Sahara.) 

•  Desktops:  16  desktop  workstations  owned  by  members  of  the  CMCL  (aphrodite  through 
zeno.) 

The  same  hosts  were  used  for  the  March,  1998  traces,  with  the  following  exceptions: 

•  Production  Cluster:  axp9  was  replaced  by  axpl  1  due  to  hardware  failures. 

•  Desktops:  argus,  asclepius,  bruce,  cobain,  darryl,  and  hestia  were  replaced  by  belushi  and 
loman  due  to  hardware  upgrades. 

Figures  1  and  2  provide  additional  details  of  the  individual  August,  1997  and  March,  1998 
traces.  The  author  will  be  happy  to  provide  the  traces  to  any  interested  readers. 

3  Statistical  analysis 

We  analyzed  the  individual  load  traces  using  summary  statistics,  histograms,  fitting  of  analytic 
distributions,  and  time  series  analysis.  The  picture  that  emerges  is  that  load  varies  over  a  wide 
range  in  very  complex  ways.  Load  distributions  are  rough  and  frequently  multi-modal.  Even 
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Hostname 

Start  Time 

Days 

S2unples 

Production  Cluster 

axpO.psc 

The  Aug  12  21 :29: 12  EDT  1997 

15.00 

1296000 

axpl.psc 

The  Aug  12  21:30:09  EDT  1997 

14.00 

1209600 

axp2.psc 

Tue  Aug  12  21:30:53  EDT  1997 

14.00 

1209600 

axpS.psc 

The  Aug  12  21:31:13  EDT  1997 

14.00 

1209600 

axp4.psc 

The  Aug  12  21:3 1:12  EDT  1997 

14.00 

1209600 

axpS.psc 

The  Aug  12  21:31:47  EDT  1997 

14.00 

1209600 

axp6.psc 

The  Aug  12  21:31:15  EDT  1997 

15.00 

1296000 

axp7.psc 

Tue  Aug  12  20:51:19  EDT  1997 

13.00 

1123200 

axpS.psc 

Tue  Aug  12  21:31:19  EDT  1997 

14.00 

1209600 

axp9.psc 

The  Aug  12  21:31:45  EDT  1997 

14.00 

1209600 

axplO.psc 

The  Aug  12  21:31:21  EDT  1997 

14.00 

1209600 

axpfea.psc 

Sat  Aug  16  14:44:29  EDT  1997 

13.00 

1123200 

axpfeb.psc 

Sat  Aug  16  14:44:55  EDT  1997 

12.00 

1036800 

Research  Cluster 

manchester-1  .cmcl 

Sun  Aug  17  19:41:10  EDT  1997 

3.92 

338400 

manchester-2.cmcl 

Sun  Aug  17  19:41:09  EDT  1997 

4.00 

345600 

manchester-S.cmcl 

Sun  Aug  17  19:41:13  EDT  1997 

3.96 

342000 

manchester-4.cmcl 

Sun  Aug  17  19:41:10  EDT  1997 

4.00 

345600 

manchester-S.cmcl 

Sun  Aug  17  19:41:09  EDT  1997 

4.04 

349200 

manchester-6.cmcl 

Sun  Aug  17  19:41:09  EDT  1997 

4.08 

352800 

manchester-7.cmcl 

Sun  Aug  17  19:41:10EDT  1997 

4.00 

345600 

manchester-8.cmcl 

Sun  Aug  17  19:41:10  EDT  1997 

4.00 

345600 

Compute  Servers 


mojave.cmcl 

Sun  Aug  17  19:41:11  EDT  1997 

4.04 

349200 

sahara.cmcl 

Sun  Aug  17  19:41:11  EDT  1997 

4.00 

345600 

Desktops 

aphrodite.nectar 

Sun  Aug  17  19:41:12  EDT  1997 

4.00 

345600 

argus.nectar 

Sun  Aug  17  19:41:17  EDT  1997 

4.04 

349200 

asbury-park.nectar 

Sun  Aug  17  19:41:11  EDT  1997 

4.00 

345600 

asclepius.nectar 

Sun  Aug  17  19:41:07  EDT  1997 

4.08 

352800 

bruce.nectar 

Sun  Aug  17  19:41:10EDT  1997 

3.92 

338400 

cobain.nectar 

Sun  Aug  17  19:41:12  EDT  1997 

4.04 

349200 

darryLnectar 

Sun  Aug  17  19:41:32  EDT  1997 

1.71 

147600 

hawaii.cmcl 

Sun  Aug  17  19:41:11  EDT  1997 

2.63 

226800 

hestia.nectar 

Sun  Aug  17  19:41:12  EDT  1997 

4.00 

345600 

newark.cmcl 

Sun  Aug  17  19:41:13  EDT  1997 

4.00 

345600 

piyonnectar 

Sun  Aug  17  19:41:13  EDT  1997 

1.71 

147600 

rhea.nectar 

Sun  Aug  17  19:41:11  EDT  1997 

4.00 

345600 

mbix.mc 

Sun  Aug  17  19:41:13  EDT  1997 

4.00 

345600 

themis.nectar 

Sun  Aug  17  19:41:09  EDT  1997 

4.00 

345600 

uranus.nectar 

Sun  Aug  17  19:41:13  EDT  1997 

4.00 

345600 

zeno.nectar 

Sun  Aug  17  19:41:13  EDT  1997 

4.08 

352800 

Figure  1:  Details  of  the  August,  1997  traces. 
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Hostname 

Start  Time 

Days 

Samples 

Production  Cluster 

axpO.psc 

Wed  Feb  25  17:34:26  EST  1998 

12.08 

1043400 

axpl.psc 

Wed  Feb  25  17:34:25  EST  1998 

12.08 

1043400 

axp2.psc 

Wed  Feb  25  17:34:25  EST  1998 

12.08 

1043400 

axpS.psc 

Wed  Feb  25  17:34:26  EST  1998 

12.03 

1039400 

axp4.psc 

Wed  Feb  25  17:34:27  EST  1998 

12.06 

1041900 

axpS.psc 

Wed  Feb  25  17:34:27  EST  1998 

12.03 

1039700 

axp6.psc 

Wed  Feb  25  17:34:27  EST  1998 

12.08 

1043400 

axpT.psc 

Wed  Feb  25  17:34:27  EST  1998 

12.01 

1037700 

axpS.psc 

Wed  Feb  25  17:34:28  EST  1998 

12.06 

1041800 

axplO.psc 

Wed  Feb  25  17:34:28  EST  1998 

12.03 

1039700 

axpll.psc 

Wed  Feb  25  17:16:20  EST  1998 

12.03 

1039800 

axpfea.psc 

Wed  Feb  25  17:18:01  EST  1998 

12.08 

1043800 

axpfeb.psc 

Wed  Feb  25  17:23:34  EST  1998 

12.0 

1043400 

Research  Cluster 

manchester- 1  .cmcl 

Wed  Feb  25  20:42:30  EST  1998 

721900 

manchester-2.cmcl 

Wed  Feb  25  20:42:23  EST  1998 

8.36 

721900 

manchester-S.cmcl 

Wed  Feb  25  20:42:23  EST  1998 

8.36 

721900 

manchester-4.cmcl 

Wed  Feb  25  20:42:29  EST  1998 

8.35 

721300 

manchester-S.cmcl 

Wed  Feb  25  20:42:27  EST  1998 

8.36 

721900 

manchester-6.cmcl 

Wed  Feb  25  20:42:30  EST  1998 

8.36 

721900 

manchester-V.cmcl 

Wed  Feb  25  20:42:24  EST  1998 

8.35 

721800 

manchester-S.cmcl 

Wed  Feb  25  20:42:26  EST  1998 

8.35 

721800 

Compute  Servers 

mojave.cmcl 
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Figure  2:  Details  of  the  March,  1998  traces. 
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Figure  3:  Correlation  coefficients  (CCs)  between  all  of  the  discussed  statistical  properties. 

traces  with  unimodal  histograms  are  not  well  fitted  by  conunon  analytic  distributions,  which  have 
tails  that  are  either  too  short  or  too  long.  Time  series  analysis  shows  that  load  is  strongly  correlated 
over  time,  but  also  has  complex,  almost  noise-Uke  frequency  domain  behavior. 

We  summarized  each  of  our  load  traces  in  terms  of  our  statistical  measures  and  computed  their 
correlations  to  determine  how  the  measiures  are  related.  Figure  3(a)  contains  the  correlations  for 
the  August,  1997  set  while  Figure  3(b)  contains  the  correlations  for  the  March,  1998  set.  Unless 
otherwise  noted,  the  remaining  figures  in  the  paper  similarly  present  independent  results  for  the 
two  sets  of  traces.  Each  cell  of  the  tables  in  Figure  3  is  the  correlation  coefficient  (CC)  between  the 
row  measure  and  the  column  measure,  computed  over  the  the  load  traces  in  the  set.  We  will  refer 
back  to  the  highlighted  cells  (where  the  absolute  correlations  are  greater  than  0.3)  throughout  the 
paper.  It  is  important  to  note  that  these  cross  correlations  can  serve  as  a  basis  for  clustering  load 
traces  into  rough  equivalence  classes. 

Summary  statistics: 

Summarizing  each  load  trace  in  terms  of  its  mean,  standard  deviation,  and  maximum  and  minimum 
illustrates  the  extent  to  which  load  varies.  Figure  4  shows  the  mean  load  and  the  +/-  one  standard 
deviation  points  for  each  of  the  traces.  As  we  might  expect,  the  mean  load  on  desktop  machines  is 
sigmficantly  lower  than  on  other  machines.  However,  we  can  also  see  a  lack  of  uniformity  within 
each  class,  despite  the  long  duration  of  the  traces.  This  is  most  clear  among  the  Production  Cluster 
machines,  where  fewer  than  half  of  the  machines  seem  to  be  doing  most  of  the  work.  This  lack 
of  uniformity  even  over  long  time  scales  shows  clear  opportunity  for  load  balancing  or  resource 
management  systems. 

From  Figure  4  we  can  also  see  that  desktop  machines  have  smaller  standard  deviations  than  the 
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Figure  5:  COV  of  load  and  mean  load:  (a)  August,  1997  traces,  (b)  March,  1998  traces. 
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(a)  (b) 

Figure  6:  Minimum,  maximum,  and  mean  load:  (a)  August,  1997  traces,  (b)  March,  1998  traces. 

other  machines.  Indeed,  the  standard  deviation,  which  shows  how  much  load  varies  in  absolute 
terms,  grows  with  increasing  mean  load  (Figure  3  shows  CC=0.53  for  the  1997  traces  and  CC=0.72 
for  the  1998  traces.)  However,  in  relative  terms,  variance  shrinks  with  increasing  mean  load.  This 
can  be  seen  in  Figure  5,  which  plots  the  coefficient  of  variation  (the  standard  deviation  divided  by 
the  mean,  abbreviated  as  the  COV)  and  the  mean  load  for  each  of  the  load  traces.  Here  we  can 
see  that  desktop  machines,  with  their  smaller  mean  loads,  have  large  COVs  compared  to  the  other 
classes  of  machines.  The  CC  between  mean  load  and  the  COV  of  load  is  -0.49  for  the  1997  traces 
and  -0.64  for  the  1998  traces.  It  is  clear  that  as  load  increases,  it  varies  less  in  relative  terms  and 
more  in  absolute  terms. 

This  difference  between  absolute  and  relative  behavior  also  holds  true  for  the  maYiinnm  load. 
Figure  6  shows  the  minimum,  maximum,  and  mean  load  for  each  of  the  traces.  The  miniTnnm 
load  is,  not  surprisingly,  zero  in  almost  every  case.  The  maximum  load  is  positively  correlated 
with  the  mean  load  (CC=0.60  for  the  1997  traces  and  CC=0.43  for  the  1998  traces  in  Figure  3.) 
Figure  7  plots  the  ratio  max/mean  and  the  mean  load  for  each  of  the  traces.  It  is  clear  that  this 
relative  measure  is  inversely  related  to  mean  load,  and  Figure  3  shows  that  the  CC  is  -0.36  for  the 
1997  traces  and  -0.48  for  the  1998  traces.  It  is  also  important  to  notice  that  while  the  differences 
in  maximum  load  between  the  hosts  are  rather  small  (Figure  6),  the  differences  in  the  max/mean 
ratio  can  be  quite  large  (Figure  7.)  Desktops  are  more  surprising  machines  in  relative  terms. 

With  respect  to  the  mapping  problem,  the  implication  of  the  differences  between  relative  and 
absolute  measures  of  variability  is  that  lightly  loaded  (low  mean  load)  hosts  are  not  always  prefer¬ 
able  over  heavily  loaded  hosts.  For  example,  if  the  performance  metric  is  itself  a  relative  one  (that 
the  execution  time  not  vary  much  relative  to  the  mean  execution  time,  say),  then  a  more  heavily 
loaded  host  may  be  preferable. 

Distributions: 

We  next  treated  each  trace  as  a  realization  of  an  independent,  identically  distributed  (HD)  stochastic 
process.  Such  a  process  is  completely  described  by  its  probability  distribution  function  (pdf), 
which  does  not  change  over  time.  Since  we  have  a  vast  number  of  data  points  for  each  trace. 
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(a)  (b) 


Figure  7:  Maximum  to  mean  load  ratios  and  mean  load:  (a)  August,  1997  traces,  (b)  March,  1998 
traces. 


(a)  axpO  (b)  axp7 


Figure  8:  Histograms  for  load  on  axpO  and  axp7  on  August  19, 1997. 
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Mean  Load 


Figure  9:  Quantile-quantile  plots  for  axp7  trace  of  August  19, 1997. 


histograms  closely  approximate  this  underlying  pdf.  We  examined  the  histograms  of  each  of  our 
load  traces  and  fitted  normal  and  exponential  distributions  to  them.  To  illustrate  the  following 
discussion,  Figure  8  shows  the  histograms  of  load  measurements  on  (a)  axpO  and  (b)  axp7  on 
August  19,  1997  (86400  samples  each.)  AxpO  has  a  high  mezui  load,  while  axp7  is  much  more 
lightly  loaded. 

Some  of  the  traces,  especially  those  with  high  mean  loads,  have  multi-modal  histograms.  Fig¬ 
ure  8(a)  is  an  example  of  such  a  multi-modal  distribution  while  Figure  8(b)  shows  a  unimodal 
distribution.  Typically,  the  modes  are  integer  multiples  of  1.0  (and  occasionally  0.5.)  One  explana¬ 
tion  for  this  behavior  is  that  jobs  on  these  machines  are  for  the  most  part  compute  bound  and  thus 
the  ready  queue  length  corresponds  to  the  number  of  jobs.  This  seems  plausible  for  the  cluster  ma¬ 
chines,  which  run  scientific  workloads  for  the  most  part.  However,  such  multi-modal  distributions 
were  also  noticed  on  the  some  of  the  other  machines. 

The  rough  appearance  of  the  histograms  (consider  Figure  8(b))  is  due  to  the  fact  that  the  under¬ 
lying  quantity  being  measured  (ready  queue  length)  is  discrete.  Load  typically  takes  on  600-3000 
unique  values  in  these  traces.  Shannon’s  entropy  measure  [25]  indicates  that  the  load  traces  can  be 
encoded  in  1.4  to  8.5  bits  per  value,  depending  on  the  trace.  These  observations  and  the  histograms 
suggest  that  load  spends  most  of  its  time  in  one  of  a  small  number  of  levels. 

The  histograms  share  very  few  common  characteristics  and  did  not  conform  well  to  the  analytic 
distributions  we  fit  to  them.  Quantile-quantile  plots  are  a  powerful  way  to  assess  how  a  distribution 
fits  data  (cf.  [14],  pp.  196—200.)  The  quantiles  (the  ex  quantile  of  a  pdf  (or  histogram)  is  the  value  x 
at  which  lOOo;  %  of  the  probability  (or  data)  falls  to  the  left  of  x)  of  the  data  set  are  plotted  against 
the  quantiles  of  the  hypothetical  analytic  distribution.  Regardless  of  the  choice  of  parameters,  the 
plot  will  be  linear  if  the  data  fits  the  distribution. 

We  fitted  normal  and  exponential  distributions  to  each  of  the  load  traces.  The  fits  are  atrocious 
for  the  multimodal  traces,  and  we  do  not  discuss  them  here.  For  the  unimodal  traces,  the  fits  are 
slightly  better.  Figure  9  shows  quantile-quantile  plots  for  (a)  normal  and  (b)  exponential  distri¬ 
butions  fitted  to  the  unimodal  axp7  load  trace  of  Figure  8(b).  Neither  the  normal  or  exponential 
distribution  correctly  captures  the  tails  of  the  load  traces.  This  can  be  seen  in  the  figure.  The 
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Lag  (seconds) 

(C) 

Figure  10:  Hme  series  analysis  of  axp7  load  trace  collected  on  August  19,  1997:  (a)  Load  trace, 
(b)  Autocorrelation  function  to  lag  600  (10  minutes),  (c)  Partial  autocorrelation  function  to  lag  600 
(10  minutes),  (d)  Periodogram. 


quantiles  of  the  data  grow  faster  than  those  of  the  normal  distribution  toward  the  right  sides  of 
Figure  9(a).  This  indicates  that  the  data  has  a  longer  or  heavier  tail  than  the  normal  distribution. 
Conversely,  the  quantiles  of  the  data  grow  more  slowly  than  those  of  the  exponential  distribution, 
as  can  be  seen  in  Figures  9(b).  This  indicates  that  the  data  has  a  shorter  tail  than  the  exponential 
distribution.  Notice  that  the  exponential  distribution  goes  as  while  the  normal  distribution  goes 
as  e  ®  . 

There  are  two  implications  of  these  complex  distributions.  First,  simulation  studies  and  analytic 
results  predicated  on  simple,  analjdic  distributions  may  produce  erroneous  results.  Clearly,  trace- 
driven  simulation  studies  are  to  be  preferred.  The  second  implication  is  that  prediction  algorithms 
should  not  only  reduce  the  overall  variance  of  the  load  signal,  but  also  produce  errors  that  are  better 
fit  an  analytic  distribution.  One  reason  for  this  is  to  make  confidence  intervals  easier  to  compute. 
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Time  series  analysis: 


We  examined  the  autocorrelation  function,  partial  autocorrelation  function,  and  periodogram  of 
each  of  the  load  traces.  These  time  series  analysis  tools  show  that  past  load  values  have  a  strong 
influence  on  future  load  values.  For  illustration.  Figure  10  shows  (a)  the  axp7  load  trace  collected 
on  August  19,  1997,  (b)  its  autocorrelation  function  to  a  lag  of  600,  (c)  its  partial  autocorrelation 
function  to  a  lag  of  600,  and  (d)  its  periodogram.  The  analysis  of  this  trace  is  representative  of  our 
results. 

The  autocorrelation  function,  which  ranges  from  -1  to  1,  shows  how  well  a  load  value  at  time 
t  is  linearly  correlated  with  its  corresponding  load  value  at  time  t  +  A  —  in  effect,  how  well  the 
value  at  time  t  linearly  predicts  the  value  at  time  t  +  A.  Autocorrelation  is  a  function  of  A,  and  in 
Figure  10(b)  we  show  the  results  for  0  <  A  <  600.  Notice  that  even  at  A  =  600  seconds,,  values 
are  still  strongly  correlated.  This  very  strong,  long  range  correlation  is  common  to  each  of  the  load 
traces. 

The  partial  autocorrelation  function  shows  how  well  purely  autoregressive  linear  models  cap¬ 
ture  the  correlation  structure  of  a  sequence  [4],  pp.  64-69.  The  square  of  the  value  of  the  function 
at  a  lag  k  indicates  the  benefit  of  advancing  from  a  fc  —  1-th  order  model  to  a  /j-th  order  model. 
Intuitively,  if  a  A;-th  order  model  were  sufficient,  then  the  partial  autocorrelation  function  would  be 
zero  beyond  a  lag  of  k  and  the  autocorrelation  function  would  be  infinite.  In  a  dual  manner,  if  a 
A;-th  order  purely  moving  average  model  were  sufficient,  then  the  autocorrelation  function  would 
be  zero  beyond  a  lag  of  k  and  the  partial  autocorrelation  function  would  be  infinite.  As  we  can 
see  from  Figure  10(b)  and  (c),  both  functions  have  extremely  large  extents.  This  suggests  that 
mixed  models,  which  combine  autoregressive  and  moving  average  components  are  appropriate  for 
modelling  host  load. 

The  periodogram  of  a  load  trace  is  the  magnitude  of  the  Fourier  transform  of  the  load  data, 
which  we  plot  on  a  log  scale  (Figure  10(d).)  The  periodogram  shows  the  contribution  of  different 
frequencies  (horizontal  axis)  to  the  signal.  What  is  clear  in  the  figure,  and  is  true  of  all  of  the  load 
traces,  is  that  there  are  significant  contributions  from  all  frequencies  —  the  signal  looks  much  tike 
noise.  We  believe  the  two  noticeable  peaks  to  be  artifacts  of  the  kernel  sample  rate  —  the  kernel 
is  not  sampling  the  length  of  the  ready  queue  frequently  enough  to  avoid  aliasing.  Only  a  few  of 
the  other  traces  exhibit  the  smaller  peaks,  but  they  all  share  the  broad  noise-like  appearance  of  this 
trace. 

There  are  several  implications  of  this  time  series  analysis.  First,  the  existence  of  such  strong 
autocorrelation  implies  that  load  prediction  based  on  past  load  values  is  feasible.  It  also  suggests 
that  simulation  models  and  analytical  work  that  eschews  this  very  clear  dependence  may  be  in 
error.  FinaUy,  the  almost  noise-like  periodograms  suggest  that  quite  complex,  possibly  nonlinear 
models  will  be  necessary  to  produce  or  predict  load. 


4  Self-similarity 

The  key  observation  of  this  section  is  that  each  of  the  load  traces  exhibits  a  high  degree  of  self¬ 
similarity.  This  is  significant  for  two  reasons.  First,  it  means  that  load  varies  significantly  across 
all  time-scales  it  is  not  the  case  that  increasing  smoothing  of  the  load  quickly  tames  its  vaiianee. 
A  job  will  have  a  great  deal  of  variance  in  its  running  time  regardless  of  how  long  it  is.  Second,  it 
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Figure  11:  Visual  representation  of  self-similarity.  Each  graph  plots  load  versus  time  and  “zooms 
in”  on  the  middle  quarter 


Figure  12;  Meaning  of  the  Hurst  parameter:  (a)  Frequency  domain  interpretation  using  power 
spectral  analysis,  (b)  Effect  of  increased  smoothing  using  dispersional  analysis. 


suggests  that  load  is  difficult  to  model  and  predict  well.  In  particular,  self-similarity  is  indicative  of 
long  memory,  possibly  non-stationary  stochastic  processes  such  as  fractional  ARIMA  models  [12, 
10,  3],  and  fitting  such  models  to  data  and  evaluating  them  can  be  quite  expensive. 

Figure  11  visually  demonstrates  the  self  similarity  of  the  axp7  load  trace.  The  top  left  graph  in 
the  figure  plots  the  load  on  this  machine  versus  time  for  10  days.  Each  subsequent  graph  “zooms 
in”  on  the  highlighted  central  25%  of  the  previous  graph,  until  we  reach  the  bottom  right  graph, 
which  shows  the  central  60  seconds  of  the  trace.  The  plots  are  scaled  to  make  the  behavior  on  each 
time  scale  obvious.  In  particular,  over  longer  time  scales,  wider  scales  are  necessary.  Intuitively,  a 
self-similar  signal  is  one  that  looks  similar  on  different  time  scales  given  this  rescaling.  Although 
the  behavior  on  the  different  graphs  is  not  identical,  we  can  clearly  see  that  there  is  significant 
variation  on  aU  time  scales. 

An  important  point  is  that  as  we  smooth  the  signal  (as  we  do  visually  as  we  “zoom  out”  toward 
the  top  of  the  page  in  Figure  11),  the  load  signal  strongly  resists  becoming  uniform.  This  suggests 
that  low  frequency  components  are  significant  in  the  overall  mix  of  the  signal,  or,  equivalently, 
that  there  is  significant  long  range  dependence.  It  is  this  property  of  self-similar  signals  that  most 
strongly  differentiates  them  and  causes  significant  modeling  difficulty. 

Self-similarity  is  more  than  intuition  —  it  is  a  well  defined  mathematical  statement  about  the 
relationship  of  the  autocorrelation  functions  of  increasingly  smoothed  versions  of  certain  kinds  of 
long-memory  stochastic  processes.  These  stochastic  processes  model  the  sort  of  the  mechanisms 
that  give  rise  to  self-similar  signals.  We  shall  avoid  a  mathematical  treatment  here,  but  interested 
readers  may  want  to  consult  [19]  or  [20]  for  a  treatment  in  the  context  of  networking  or  [2]  for 
its  connection  to  fractal  geometry,  or  [3]  for  a  treatment  from  a  linear  time  series  point  of  view. 
Interestingly,  self-similarity  has  revolutionized  network  traffic  modelling  in  the  1990s  [9,  19  20 
28]. 

The  degree  and  nature  of  the  self-similarity  of  a  sequence  is  summarized  by  the  Hurst  param¬ 
eter,  H  [13].  Intuitively,  H  describes  the  relative  contribution  of  low  and  high  frequency  com¬ 
ponents  to  the  signal.  Consider  Figure  12(a),  which  plots  the  periodogram  (the  magnitude  of  the 
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Figure  13:  Hurst  parameter  estimates  (mean  of  four  point  estimates  and  standard  deviation)  :  (a) 
August,  1997  traces,  (b)  March,  1998  traces. 


Fourier  transform)  of  the  axp7  load  trace  of  August  19, 1997  on  a  log-log  scale.  In  this  transformed 
form,  we  can  describe  the  trend  with  a  line  of  slope  — /?  (meaning  that  the  periodogram  decays  hy- 
perbolicaUy  with  frequency  u  as  u}~^.  The  Hurst  parameter  H  is  then  defined  as  H  =  (1  +  /?)/2. 
As  we  can  see  in  Figure  12(a),  H  =  0.5  corresponds  to  a  line  of  zero  slope.  This  is  the  uncorrelated 
noise  case,  where  all  frequencies  make  a  roughly  equal  contribution.  As  H  increases  beyond  0.5, 
we  see  that  low  frequencies  make  more  of  a  contribution.  Similarly,  as  H  decreases  below  0.5, 
low  frequencies  make  less  of  a  contribution.  H  >  0.5  indicates  self-similarity  with  positive  near 
neighbor  correlation,  while  H  <0.5  indicates  self-similarity  with  negative  near  neighbor  correla¬ 
tion.  Figure  12(a)  shows  that  the  axp7  trace  is  indeed  self-similar  with  H  =  0.875.  This  method 
of  determining  the  Hurst  parameter  is  known  as  power  spectral  analysis. 

Figure  12(b)  illustrates  another  way  to  think  about  the  Hurst  parameter  H.  To  create  the  figure, 
we  binned  the  load  trace  with  increasingly  larger,  non-overlapping  bins  and  then  plotted  the  relative 
dispersion  (the  standard  deviation  normalized  by  the  mean)  of  the  binned  series  versus  the  size  of 
the  bins.  For  example,  at  a  bin  size  of  8  we  averaged  the  first  8  samples  of  the  original  series  to 
form  the  first  sample  of  the  binned  series,  the  next  8  to  form  the  second  sample,  and  so  on.  The 
figure  shows  that  the  relative  dispersion  of  this  new  8-binned  series  is  shghdy  less  than  one.  If 
the  original  load  trace  is  self-similar,  the  relative  dispersion  should  decline  hyperbolically  with 
increasing  bin  size.  On  a  log-log  scale  such  we  use  in  the  figure  this  relationship  would  appear  to 
be  linear  with  a  slope  of  H  —  1.  We  see  that  this  is  indeed  the  case  for  the  axp7  trace.  Notice  that 
this  method  for  estimating  H,  which  is  called  dispersional  analysis,  gives  a  slightly  different  H 
(0.95)  than  power  spectral  analysis.  What  is  important  is  that  in  both  figures  we  see  a  h5q)erbolic 
relationship,  and  that  both  estimates  for  H  are  much  larger  than  0.5. 

We  examined  each  of  the  load  traces  for  self-similarity  and  estimated  each  one’s  Hurst  param¬ 
eter.  There  are  many  different  estimators  for  the  Hurst  parameter  [27],  but  there  is  no  consensus 
on  how  to  best  estimate  the  Hurst  parameter  of  a  measured  series.  The  most  common  technique  is 
to  use  several  Hurst  parameter  estimators  and  try  to  find  agreement  among  them.  The  four  Hurst 
parameter  estimators  we  used  were  R/S  analysis,  the  variance-time  method,  dispersional  analy- 
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Figure  14:  Hurst  parameter  point  estimates:  (a)  August,  1997  traces,  (b)  March,  1998  traces. 


sis,  and  power  spectral  analysis.  A  description  of  these  estimators  as  well  as  several  others  may  be 
found  in  [2].  The  advantage  of  these  estimators  is  that  they  make  no  assumptions  about  the  stochas¬ 
tic  process  that  generated  the  sequence.  However,  they  also  cannot  provide  confidence  intervals 
for  their  estimates.  Estimators  such  as  the  Whittle  estimator  [3]  can  provide  confidence  intervals, 
but  a  stochastic  process  model  must  be  assumed  over  which  an  H  can  be  found  that  maximizes  its 
likelihood. 

We  implemented  the  estimators  using  Matlab  and  validated  each  one  by  examining  degenerate 
series  with  known  H  and  series  with  specific  H  generated  using  the  random  midpoint  displacement 
method.  The  dispersional  analysis  method  was  found  to  be  rather  weak  for  H  values  less  than  about 
0.8  and  the  power  spectral  analysis  method  gave  the  most  consistent  results. 

Figure  13  presents  our  estimates  of  the  Hurst  parameters  of  each  of  load  traces.  In  the  graph, 
each  central  point  is  the  mean  of  the  four  estimates,  while  the  outlying  points  are  at  +/-  one  standard 
deviation.  Figure  14  shows  the  four  actual  point  estimates  for  each  trace.  Notice  that  for  small  H 
values,  the  standard  deviation  is  high  due  to  the  inaccuracy  of  dispersional  analysis.  The  important 
point  is  that  the  mean  Hurst  estimates  are  all  significantly  above  the  H  =  0.5  line  and  their  +/-  one 
standard  deviation  points  are  also  above  the  line. 

The  traces  exhibit  self-similarity  Hurst  parameters  ranging  from  0.73  to  0.99,  with  a  strong 
bias  toward  the  top  of  that  range.  An  examination  of  the  correlation  coefficients  (CCs)  in  Figure  3 
shows  some  surprising  results.  For  the  August,  1997  traces,  the  Hurst  parameter  is  positively  cor¬ 
related  with  mean  load  (CC=0.45)  and  the  standard  deviation  of  load  (CC=0.58),  but  is  negatively 
correlated  with  the  max/mean  load  ratio  (CC=-0.49).  On  the  other  hand,  for  the  March,  1998 
traces,  the  Hurst  parameter  is  negatively  correlated  with  mean  load  (CC=-0.30)  and  the  standard 
deviation  of  load  (CC=-0.41),  but  is  positively  correlated  with  the  max/mean  ratio  (CC=0.36).  Fur¬ 
thermore,  in  the  March,  1998  traces,  we  find  the  Hurst  parameter  is  strongly  positively  correlated 
with  the  epoch  statistics,  which  is  not  the  case  at  all  for  the  August,  1997  traces.  It  is  not  clear 
what  the  cause  for  this  difference  between  the  traces  is. 

As  we  discussed  above,  self-similarity  has  implications  for  load  modeling  and  for  load  smooth¬ 
ing.  The  long  memory  stochastic  process  models  that  can  capture  self-similarity  tend  to  be  expen- 
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sive  to  fit  to  data  and  evaluate.  Smoothing  the  load  (by  mapping  large  units  of  computations  instead 
of  small  units,  for  example)  may  be  misguided  since  variance  may  not  decline  with  increasing 
smoothing  intervals  as  quickly  as  quickly  as  expected.  Consider  smoothing  load  by  averaging  over 
an  interval  of  length  m.  Without  long  range  dependence  (H  =  0.5),  variance  would  decay  with 
m  as  while  with  long  range  dependence,  as  or  and  for  the  range  of 

Hurst  parameters  we  measured. 

5  Epochal  behavior 

The  key  observation  in  this  section  is  that  while  load  changes  in  complex  ways,  the  manner  in 
which  it  changes  remains  relatively  constant  for  relatively  long  periods  of  time.  We  refer  to  a 
period  of  time  in  which  this  stability  holds  true  as  an  epoch.  For  example,  the  load  signal  could  be 
a  0.25  Hz  Sin  wave  for  a  minute  and  a  0.125  Hz  sawtooth  wave  the  next  minute  —  each  minute 
is  an  epoch.  That  these  epochs  exist  and  are  long  is  significant  because  it  suggests  that  modeling 
load  can  be  simplified  by  modeling  epochs  separately  from  modeling  the  behavior  within  an  epoch. 
Similarly,  it  suggests  a  two  stage  prediction  process. 

The  spectrogram  representation  of  a  load  trace  immediately  highlights  the  epochal  behavior  we 
discuss  in  this  section.  A  spectrogram  combines  the  frequency  domain  and  time  domain  represen¬ 
tations  of  a  time  series.  It  shows  how  the  frequency  domain  changes  locally  (for  a  small  segment 
of  the  signal)  over  time.  For  our  purposes,  this  local  frequency  domain  information  is  the  “manner 
in  which  [the  load]  changes”  to  which  we  referred  earlier.  To  form  a  spectrogram,  we  slide  a  win¬ 
dow  of  length  w  over  the  series,  and  at  each  offset  k,  we  Fourier-transform  the  w  elements  in  the 
window  to  give  us  w  complex  Fourier  coefficients.  Since  our  load  series  is  real- valued,  only  the 
first  w;/2  of  these  coefficients  are  needed.  We  form  a  plot  where  the  x  coordinate  is  the  offset  k, 
the  y  coordinate  is  the  coefficient  number,  1, 2, . . . ,  tt;/2  and  the  2;  coordinate  is  the  magnitude  of 
the  coefficient.  To  simplify  presentation,  we  collapse  to  two  dimensions  by  mapping  the  logarithm 
of  the  2  coordinate  (the  magnitude  of  the  coefficient)  to  color.  The  spectrogram  is  basically  a  mid¬ 
point  in  the  tradeoff  between  purely  time-domain  or  frequency-domain  representations.  Along  the 
X  axis  we  see  the  effects  of  time  and  along  the  y  axis  we  see  the  effects  of  frequency. 

Figure  15  shows  a  representative  case,  a  24  hour  trace  from  the  PSC  host  axp7,  taken  on  August 
19, 1997.  The  top  graph  shows  the  time  domain  representation,  while  the  bottom  graph  shows  the 
corresponding  spectrogram  representation.  What  is  important  to  note  (and  which  occurs  in  aU 
the  spectrograms  of  all  the  traces)  are  the  relatively  wide  vertical  bands.  These  indicate  that  the 
frequency  domain  of  the  underlying  signal  stays  relatively  stable  for  long  periods  of  time.  We  refer 
to  the  width  of  a  band  as  the  duration  of  that  frequency  epoch. 

That  these  epochs  exist  can  be  explained  by  programs  executing  different  phases,  programs 
being  started  and  shut  down,  and  the  Uke.  The  frequency  content  within  an  epoch  contains  energy 
at  higher  frequencies  because  of  events  that  happen  on  smaller  time-frames,  such  as  user  input, 
device  driver  execution,  and  daemon  execution. 

We  can  algorithmically  find  the  edges  of  these  epochs  by  computing  the  difference  in  adja¬ 
cent  spectra  in  the  spectrogram  and  then  looking  for  those  offsets  where  the  differences  exceed 
a  threshold.  Specifically,  we  compute  the  sum  of  mean  squared  errors  for  each  pair  of  adjacent 
spectra.  The  elements  of  this  error  vector  are  compared  to  an  epsilon  (5%  here)  times  the  mean  of 
the  vector.  Where  this  threshold  is  exceeded,  a  new  epoch  is  considered  to  begin.  Having  found 
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Figure  16:  Mean  epoch  length  +/-  one  standard  deviation:  (a)  August,  1997  traces,  (b)  March, 
1998  traces. 


the  epochs,  we  can  examine  their  statistics.  Figime  16  shows  the  mean  epoch  length  and  the  +/-  one 
standard  deviation  levels  for  each  of  the  load  traces.  The  mean  epoch  length  ranges  from  about  150 
seconds  to  over  450  seconds,  depending  on  which  trace.  The  standard  deviations  are  also  relatively 
high  (80  seconds  to  over  600  seconds.)  It  is  the  Production  Cluster  class  which  is  clearly  different 
when  it  comes  to  epoch  length.  The  machines  in  this  class  tend  to  have  much  higher  means  and 
standard  deviations  than  the  other  machines.  One  explanation  might  be  that  most  of  the  machines 
run  batch-scheduled  scientific  jobs  which  may  well  have  longer  computation  phases  and  running 
times.  However,  two  of  the  interactive  machines  also  exhibit  high  means  and  standard  deviations. 
Interestingly,  there  is  no  correlation  of  the  mean  epoch  length  and  standard  deviation  to  the  mean 
load  for  either  set  of  traces  (Figure  3.)  However,  for  the  March,  1998  traces,  we  find  correlations 
between  the  epoch  length  statistics  and  the  Hurst  parameter  that  are  particularly  strong.  It  is  not 
clear  why  this  difference  exists  between  the  traces. 

The  standard  deviations  of  epoch  length  in  Figure  16  give  us  an  absolute  measure  of  the  vari¬ 
ance  of  epoch  length.  Figure  17  shows  the  coefficient  of  variance  (COV)  of  epoch  length  and  mean 
epoch  length  for  each  trace.  The  COV  is  our  relative  measure  of  epoch  length  variance.  Unlike 
with  load  (Section  3),  these  absolute  and  relative  measures  of  epoch  length  variance  are  both  posi¬ 
tively  correlated  with  the  mean  epoch  length.  In  addition,  the  correlation  is  especially  strong  (the 
CCs  are  at  least  0.88.)  As  epoch  length  increases,  it  varies  more  in  both  absolute  and  relative  terms. 
The  statistical  properties  of  epoch  lengths  are  independent  of  the  statistical  properties  of  load. 

The  implication  of  long  epoch  lengths  is  that  the  problem  of  predicting  load  may  be  able  to  be 
decomposed  into  a  segmentation  problem  (finding  the  epochs)  and  a  sequence  of  smaller  prediction 
subproblems  (predicting  load  within  each  epoch.) 

Strictly  speaking,  epochal  behavior  means  that  load  is  not  stationary.  However,  it  is  also  not 
free  to  wander  at  will  —  clearly  load  cannot  rise  to  infinite  levels  or  fall  below  zero.  This  is 
compatible  with  the  “borderline  stationarity”  implied  by  self-similarity. 
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Figure  17:  COV  of  epoch  length  and  mean  epoch  length:  (a)  August,  1997  traces,  (b)  March,  1998 
traces. 


Lack  of  seasonality: 

It  is  important  to  note  that  the  epochal  behavior  of  the  load  traces  is  not  the  same  thing  as  season¬ 
ality  in  the  time  series  analysis  sense  [5,  4].  Seasonality  means  that  there  are  dominant  (or  at  least 
visible)  underlying  periodic  signals  on  top  of  which  are  layered  other  signals.  It  is  not  unreasonable 
to  expect  seasonality  given  that  other  studies  [21]  have  found  that  availability  of  compute  resources 
to  change  regularly  over  the  hours  of  the  working  day  and  the  days  of  the  working  week.  However, 
examination  of  the  power  spectrums  and  autocorrelations  of  the  load  traces  suggests  that  load  does 
not  exhibit  seasonality.  We  feel  this  does  not  contradict  the  earlier  results  —  the  fluctuation  of 
resources  simply  is  not  sufficiently  periodic  to  qualify  as  seasonaMty  in  the  strict  timp  series  sense. 


6  Conclusions  and  future  work 

V 

We  collected  long,  flne  grain  load  measurements  on  a  wide  variety  of  machines  at  two  different 
times  of  the  year.  The  results  of  an  extensive  statistical  analysis  of  these  traces  and  their  implica¬ 
tions  are  the  following: 

1 .  The  traces  exhibit  low  means  but  very  high  standard  deviations  and  maximnms  This  implies 
that  these  machines  have  plenty  of  cycles  to  spare  to  execute  jobs,  but  the  execution  time  of 
these  jobs  wiU  vary  drastically. 

2.  Absolute  measiues  of  variation  are  positively  correlated  with  the  mean  while  relative  mea¬ 
sures  are  negatively  correlated.  This  suggests  that  it  may  not  be  unreasonable  to  map  tasks 
to  heavily  loaded  machines  under  some  performance  metrics. 

3.  The  traces  have  complex,  rough,  and  often  multimodal  distributions  that  are  not  well  fitted  by 
anal5^c  distributions  such  as  the  normal  or  exponential  distributions,  which  are  particularly 
inept  at  captunng  the  tail  of  the  distribution.  This  implies  that  modeling  and  simulation  that 
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assumes  convenient  analytical  load  distributions  may  be  flawed.  Trace-driven  simulation 
may  be  preferable. 

4.  Load  is  strongly  correlated  over  time,  but  has  a  broad,  almost  noise-like  frequency  spectrum. 
This  implies  that  history-based  load  prediction  schemes  are  feasible,  but  that  linear  methods 
may  have  difficulty.  Realistic  load  models  should  capture  this  dependence,  or  trace-driven 
simulation  should  be  used. 

5.  The  traces  are  self-similar  with  relatively  high  Hurst  parameters.  This  means  that  load 
smoothing  will  decrease  variance  much  more  slowly  than  expected.  It  may  be  preferable 
to  migrate  tasks  in  the  face  of  adverse  load  conditions  instead  of  waiting  for  the  adversity  to 
be  ameliorated  over  the  long  term.  Self-similarity  also  suggests  certain  modeling  approaches 
such  as  fractional  ARIMA  models  [12, 10,  3]  and  non-linear  models  which  can  capture  the 
self-similarity  property. 

6.  The  traces  display  epochal  behavior  in  that  the  local  frequency  content  of  the  load  signal 
remains  quite  stable  for  long  periods  of  time  and  changes  abruptly  at  the  boundaries  of  such 
epochs.  This  suggests  that  the  problem  of  predicting  load  may  be  able  to  be  decomposed 
into  a  sequence  of  smaller  subproblems. 

Given  these  properties,  we  decided  to  study  the  performance  of  Box- Jenkins  linear  time  series 
models  [4]  and  fractional  ARFIMA  models  [10,  12,  3]  for  short  range  prediction  of  host  load. 
The  more  complex  models  do  indeed  tend  to  fit  the  data  of  a  load  trace  better  than  simple  models. 
For  example,  fractional  ARFIMA  models  had  as  much  as  40%  lower  mean  squared  error  in  some 
cases.  Surprisingly,  however,  we  found  that  simple  autoregressive  models  performed  sufficiently 
well  for  short  range  prediction  [7].  While  there  were  statistically  significant  differences  between 
the  models  we  studied,  they  were  not  sufficiently  large  to  warrant  the  use  of  more  complex  models. 
Our  recommendation  is  to  use  autoregressive  models  of  order  16  or  higher  for  prediction  horizons 
of  up  to  30  seconds. 
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